{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Configuration Tutorial\n",
    "\n",
    "This tutorial will show how to use Tribuo's configuration and provenance systems to build models on MNIST (because we wouldn't be doing ML without an MNIST demo).\n",
    "We'll focus on logistic regression, show how many different trainers can be stored in the same configuration, and how the provenance system allows the configuration for a specific run to be regenerated.\n",
    "We'll also briefly look at Tribuo's feature transformation system and see how that integrates into configuration and provenance.\n",
    "\n",
    "## Setup\n",
    "You'll need to get a copy of the MNIST dataset in the original IDX format.\n",
    "\n",
    "First the training data:\n",
    "\n",
    "`wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz`\n",
    "\n",
    "Then the test data:\n",
    "\n",
    "`wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz`\n",
    "\n",
    "Tribuo's IDX loader natively reads gzipped files so you don't need to unzip them.\n",
    "\n",
    "It's Java, so first we load in the necessary Tribuo jars. Here we're using the classification experiments jar, along with the json interop jar to read and write the provenance information."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%jars ./tribuo-classification-experiments-4.3.0-jar-with-dependencies.jar\n",
    "%jars ./tribuo-json-4.3.0-jar-with-dependencies.jar"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now lets import the packages we need. We'll use a few file manipulation things from Java, and then Tribuo's core packages, the transformation packages, the classification package, classification evaluation package, and then a few things that relate to the provenance system."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import java.nio.file.Files;\n",
    "import java.nio.file.Paths;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import org.tribuo.*;\n",
    "import org.tribuo.util.Util;\n",
    "import org.tribuo.transform.*;\n",
    "import org.tribuo.transform.transformations.LinearScalingTransformation;\n",
    "import org.tribuo.classification.*;\n",
    "import org.tribuo.classification.evaluation.*;\n",
    "import com.oracle.labs.mlrg.olcut.config.Configurable;\n",
    "import com.oracle.labs.mlrg.olcut.config.ConfigurationManager;\n",
    "import com.oracle.labs.mlrg.olcut.config.DescribeConfigurable;\n",
    "import com.oracle.labs.mlrg.olcut.provenance.*;\n",
    "import com.oracle.labs.mlrg.olcut.provenance.primitives.*;\n",
    "import com.oracle.labs.mlrg.olcut.config.json.JsonConfigFactory;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default OLCUT's `ConfigurationManager` only understands XML files, the snippet below adds JSON support to all `ConfigurationManager`s in the running JVM. It can be added dynamically on the command line by supplying `--config-file-format <fully-qualified-class-name>` where the class name is for example `com.oracle.labs.mlrg.olcut.config.json.JsonConfigFactory`, if you're using OLCUT's CLI options processing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "ConfigurationManager.addFileFormatFactory(new JsonConfigFactory())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "OLCUT supports XML, JSON, [edn](https://github.com/edn-format/edn), and [protobuf](https://developers.google.com/protocol-buffers) format configuration files. It also supports serialization for `Provenance` objects in XML, JSON, and protobuf formats."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How does configuration work?\n",
    "Tribuo uses a configuration system originally built in Sun Labs, open sourced in the [OLCUT](https://github.com/oracle/olcut) library. Classes which can be configured must implement the `Configurable` interface, and optionally implement a `public void postConfig()` method, which can be used to check invariants after a class has beeen configured but before it's visible. Configurable classes can mark which of their fields are available for configuration using the `@Config` annotation, which accepts three arguments: `boolean mandatory` if the configuration system should error out when the field is not configured, `String description` a description of the field used as a comment and in the `DescribeConfigurable` system seen below, and `boolean redact` which controls if this field value should be saved into configuration files or written into provenance objects.\n",
    "\n",
    "As configuration is part of the class file rather than the public documented API (because it operates on private fields), OLCUT ships with a CLI utility for inspecting a configurable class and generating an example configuration in any supported configuration format. To use this utility from the command line you can run:\n",
    "```\n",
    "$ java -cp <path-to-jars-including-olcut-core> com.oracle.labs.mlrg.olcut.config.DescribeConfigurable -n <class-name> -o -e xml\n",
    "```\n",
    "where the `-n` argument denotes what class to describe, `-o` denotes that an example configuration should be generated, and `-e` gives the file format to emit the example configuration in.\n",
    "\n",
    "You can also use the REPL to inspect a configurable class, like so:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Class: org.tribuo.classification.sgd.linear.LinearSGDTrainer\n",
      "\n",
      "Field Name      Type                                         Mandatory Redact Default                                                       Description\n",
      "epochs          int                                          false     false  5                                                             The number of gradient descent epochs.\n",
      "loggingInterval int                                          false     false  -1                                                            Log values after this many updates.\n",
      "minibatchSize   int                                          false     false  1                                                             Minibatch size in SGD.\n",
      "objective       org.tribuo.classification.sgd.LabelObjective false     false  LogMulticlass                                                 The classification objective function to use.\n",
      "optimiser       org.tribuo.math.StochasticGradientOptimiser  false     false  AdaGrad(initialLearningRate=1.0,epsilon=0.1,initialValue=0.0) The gradient optimiser to use.\n",
      "seed            long                                         false     false  12345                                                         Seed for the RNG used to shuffle elements.\n",
      "shuffle         boolean                                      false     false  true                                                          Shuffle the data before each epoch. Only turn off for debugging.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "var className = \"org.tribuo.classification.sgd.linear.LinearSGDTrainer\";\n",
    "var clazz = (Class<? extends Configurable>) Class.forName(className);\n",
    "var map = DescribeConfigurable.generateFieldInfo(clazz);\n",
    "\n",
    "var output = DescribeConfigurable.generateDescription(map);\n",
    "\n",
    "System.out.println(\"Class: \" + clazz.getCanonicalName() + \"\\n\");\n",
    "System.out.println(DescribeConfigurable.formatDescription(output));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And also to print out an example config file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"config\" : {\n",
      "    \"components\" : [ {\n",
      "      \"name\" : \"example\",\n",
      "      \"type\" : \"org.tribuo.classification.sgd.linear.LinearSGDTrainer\",\n",
      "      \"export\" : \"false\",\n",
      "      \"import\" : \"false\",\n",
      "      \"properties\" : {\n",
      "        \"seed\" : \"0\",\n",
      "        \"minibatchSize\" : \"0\",\n",
      "        \"shuffle\" : \"false\",\n",
      "        \"epochs\" : \"0\",\n",
      "        \"optimiser\" : \"StochasticGradientOptimiser-instance\",\n",
      "        \"loggingInterval\" : \"0\",\n",
      "        \"objective\" : \"LabelObjective-instance\"\n",
      "      }\n",
      "    } ]\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "ByteArrayOutputStream writer = new ByteArrayOutputStream();\n",
    "DescribeConfigurable.writeExampleConfig(writer,\"json\",clazz,map);\n",
    "System.out.println(writer.toString(\"UTF-8\"));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using a configuration file\n",
    "We're going to read in an example configuration file, in JSON format. This configuration knows about a bunch of different trainers, and also the training and testing MNIST data sources. In the tutorials directory we supply both the JSON and XML versions of this file, and the remainder of this tutorial is completely agnostic to which one is used."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{\n",
       "  \"config\" : {\n",
       "    \"components\" : [ {\n",
       "      \"name\" : \"mnist-test\",\n",
       "      \"type\" : \"org.tribuo.datasource.IDXDataSource\",\n",
       "      \"export\" : \"false\",\n",
       "      \"import\" : \"false\",\n",
       "      \"properties\" : {\n",
       "        \"featuresPath\" : \"t10k-images-idx3-ubyte.gz\",\n",
       "        \"outputPath\" : \"t10k-labels-idx1-ubyte.gz\",\n",
       "        \"outputFactory\" : \"label-factory\"\n",
       "      }\n",
       "    }, {\n",
       "      \"name\" : \"mnist-train\",\n",
       "      \"type\" : \"org.tribuo.datasource.IDXDataSource\",\n",
       "      \"export\" : \"false\",\n",
       "      \"import\" : \"false\",\n",
       "      \"properties\" : {\n",
       "        \"featuresPath\" : \"train-images-idx3-ubyte.gz\",\n",
       "        \"outputPath\" : \"train-labels-idx1-ubyte.gz\",\n",
       "        \"outputFactory\" : \"label-factory\"\n",
       "      }\n",
       "    }, {\n",
       "      \"name\" : \"adagrad\",\n",
       "      \"type\" : \"org.tribuo.math.optimisers.AdaGrad\",\n",
       "      \"export\" : \"false\",\n",
       "      \"import\" : \"false\",\n",
       "      \"properties\" : {\n",
       "        \"epsilon\" : \"0.01\",\n",
       "        \"initialLearningRate\" : \"0.5\"\n",
       "      }\n",
       "    }, {\n",
       "      \"name\" : \"log\",\n",
       "      \"type\" : \"org.tribuo.classification.sgd.objectives.LogMulticlass\",\n",
       "      \"export\" : \"false\",\n",
       "      \"import\" : \"false\"\n",
       "    }, {\n",
       "      \"name\" : \"label-factory\",\n",
       "      \"type\" : \"org.tribuo.classification.LabelFactory\",\n",
       "      \"export\" : \"false\",\n",
       "      \"import\" : \"false\"\n",
       "    }, {\n",
       "      \"name\" : \"gini\",\n",
       "      \"type\" : \"org.tribuo.classification.dtree.impurity.GiniIndex\",\n",
       "      \"export\" : \"false\",\n",
       "      \"import\" : \"false\"\n",
       "    }, {\n",
       "      \"name\" : \"cart\",\n",
       "      \"type\" : \"org.tribuo.classification.dtree.CARTClassificationTrainer\",\n",
       "      \"export\" : \"false\",\n",
       "      \"import\" : \"false\",\n",
       "      \"properties\" : {\n",
       "        \"maxDepth\" : \"6\",\n",
       "        \"impurity\" : \"gini\",\n",
       "        \"seed\" : \"12345\",\n",
       "        \"fractionFeaturesInSplit\" : \"0.5\"\n",
       "      }\n",
       "    }, {\n",
       "      \"name\" : \"entropy\",\n",
       "      \"type\" : \"org.tribuo.classification.dtree.impurity.Entropy\",\n",
       "      \"export\" : \"false\",\n",
       "      \"import\" : \"false\"\n",
       "    }, {\n",
       "      \"name\" : \"logistic\",\n",
       "      \"type\" : \"org.tribuo.classification.sgd.linear.LinearSGDTrainer\",\n",
       "      \"export\" : \"false\",\n",
       "      \"import\" : \"false\",\n",
       "      \"properties\" : {\n",
       "        \"seed\" : \"1\",\n",
       "        \"minibatchSize\" : \"1\",\n",
       "        \"epochs\" : \"2\",\n",
       "        \"optimiser\" : \"adagrad\",\n",
       "        \"objective\" : \"log\",\n",
       "        \"loggingInterval\" : \"10000\"\n",
       "      }\n",
       "    }, {\n",
       "      \"name\" : \"xgboost\",\n",
       "      \"type\" : \"org.tribuo.classification.xgboost.XGBoostClassificationTrainer\",\n",
       "      \"export\" : \"false\",\n",
       "      \"import\" : \"false\",\n",
       "      \"properties\" : {\n",
       "        \"numTrees\" : \"10\",\n",
       "        \"maxDepth\" : \"4\",\n",
       "        \"eta\" : \"0.5\",\n",
       "        \"seed\" : \"1\",\n",
       "        \"minChildWeight\" : \"1.0\",\n",
       "        \"subsample\" : \"1.0\",\n",
       "        \"nThread\" : \"6\",\n",
       "        \"gamma\" : \"0.1\"\n",
       "      }\n",
       "    } ]\n",
       "  }\n",
       "}"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var configPath = Paths.get(\"configuration\",\"example-config.json\");\n",
    "String.join(\"\\n\",Files.readAllLines(configPath));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we'll make a `ConfigurationManager` and hand it the configuration file to load. Our configuration system also supports CLI options which can load things out of the supplied configuration files. We have examples of this in each of the simple `TrainTest` demo classes in each prediction backend."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "var cm = new ConfigurationManager(configPath.toString());"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First we'll load in the training and testing `DataSource`s (as instances of `IDXDataSource`), pass them into two `Dataset`s to aggregate the appropriate metadata, and we'll make the evaluator for later use."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training data size = 60000, number of features = 717, number of classes = 10\n",
      "Testing data size = 10000, number of features = 668, number of classes = 10\n"
     ]
    }
   ],
   "source": [
    "DataSource<Label> mnistTrain = (DataSource<Label>) cm.lookup(\"mnist-train\");\n",
    "DataSource<Label> mnistTest = (DataSource<Label>) cm.lookup(\"mnist-test\");\n",
    "var trainData = new MutableDataset<>(mnistTrain);\n",
    "var testData = new MutableDataset<>(mnistTest);\n",
    "var evaluator = new LabelEvaluator();\n",
    "System.out.println(String.format(\"Training data size = %d, number of features = %d, number of classes = %d\",trainData.size(),trainData.getFeatureMap().size(),trainData.getOutputInfo().size()));\n",
    "System.out.println(String.format(\"Testing data size = %d, number of features = %d, number of classes = %d\",testData.size(),testData.getFeatureMap().size(),testData.getOutputInfo().size()));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading in trainers from the configuration\n",
    "Our configuration file contains a number of different trainers, so let's pull them out and take a look.\n",
    "\n",
    "The first one we'll see is a CART decision tree, with a max tree depth of 6."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "CARTClassificationTrainer(maxDepth=6,minChildWeight=5.0,minImpurityDecrease=0.0,fractionFeaturesInSplit=0.5,useRandomSplitPoints=false,impurity=GiniIndex,seed=12345)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var cart = (Trainer<Label>) cm.lookup(\"cart\");\n",
    "cart"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next we'll load an XGBoost trainer, using 10 trees, 6 computation threads, and some regularisation parameters. Note: Tribuo's XGBoost support relies upon the Maven Central XGBoost jar from DMLC which contains macOS and Linux binaries, on Windows please compile DMLC's XGBoost jar from source and rebuild Tribuo."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "XGBoostTrainer(numTrees=10,parameters{colsample_bytree=1.0, tree_method=auto, seed=1, max_depth=4, booster=gbtree, objective=multi:softprob, lambda=1.0, eta=0.5, nthread=6, alpha=1.0, subsample=1.0, gamma=0.1, min_child_weight=1.0, verbosity=0})"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var xgb = (Trainer<Label>) cm.lookup(\"xgboost\");\n",
    "xgb"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally we'll load in a logistic regression trainer, using AdaGrad as the gradient optimizer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "LinearSGDTrainer(objective=LogMulticlass,optimiser=AdaGrad(initialLearningRate=0.5,epsilon=0.01,initialValue=0.0),epochs=2,minibatchSize=1,seed=1)"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var logistic = (Trainer<Label>) cm.lookup(\"logistic\");\n",
    "logistic"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also load a list in containing all the `Trainer` implementations in this config file. Note: the config system by default returns the same instance when it's queried for the same named config. So the list contains references to the objects we've already loaded."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loaded 3 trainers.\n"
     ]
    }
   ],
   "source": [
    "var trainers = (List<Trainer>) cm.lookupAll(Trainer.class);\n",
    "System.out.println(\"Loaded \" + trainers.size() + \" trainers.\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training the model and extracting configuration\n",
    "We're going to focus on the logistic regression trainer now, so let's train a logistic regression model on our MNIST training set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training logistic regression took (00:00:04:874)\n"
     ]
    }
   ],
   "source": [
    "var lrStartTime = System.currentTimeMillis();\n",
    "var lrModel = logistic.train(trainData);\n",
    "var lrEndTime = System.currentTimeMillis();\n",
    "System.out.println(\"Training logistic regression took \" + Util.formatDuration(lrStartTime,lrEndTime));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can inspect the trained model for it's provenance, as we saw in the Classification tutorial.\n",
    "\n",
    "The new step is extracting a configuration from that provenance. The `ProvenanceUtil.extractConfiguration()` call returns a `List<ConfigurationData>` which is the object representation of a configuration file. We can see that it's extracted configurations for 5 objects from our single model, we'll look at those after we've written out the file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var provenance = lrModel.getProvenance();\n",
    "var provConfig = ProvenanceUtil.extractConfiguration(provenance);\n",
    "provConfig.size()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `ConfigurationManager` is the way we can generate a configuration file from the object representation.\n",
    "We create a new `ConfigurationManager`, add the configuration we extracted from the provenance, and then write\n",
    "it out to a new JSON file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{\n",
       "  \"config\" : {\n",
       "    \"components\" : [ {\n",
       "      \"name\" : \"idxdatasource-1\",\n",
       "      \"type\" : \"org.tribuo.datasource.IDXDataSource\",\n",
       "      \"export\" : \"false\",\n",
       "      \"import\" : \"false\",\n",
       "      \"properties\" : {\n",
       "        \"outputPath\" : \"/local/ExternalRepositories/tribuo/tutorials/train-labels-idx1-ubyte.gz\",\n",
       "        \"outputFactory\" : \"labelfactory-4\",\n",
       "        \"featuresPath\" : \"/local/ExternalRepositories/tribuo/tutorials/train-images-idx3-ubyte.gz\"\n",
       "      }\n",
       "    }, {\n",
       "      \"name\" : \"linearsgdtrainer-0\",\n",
       "      \"type\" : \"org.tribuo.classification.sgd.linear.LinearSGDTrainer\",\n",
       "      \"export\" : \"false\",\n",
       "      \"import\" : \"false\",\n",
       "      \"properties\" : {\n",
       "        \"seed\" : \"1\",\n",
       "        \"minibatchSize\" : \"1\",\n",
       "        \"shuffle\" : \"true\",\n",
       "        \"epochs\" : \"2\",\n",
       "        \"optimiser\" : \"adagrad-2\",\n",
       "        \"loggingInterval\" : \"10000\",\n",
       "        \"objective\" : \"logmulticlass-3\"\n",
       "      }\n",
       "    }, {\n",
       "      \"name\" : \"adagrad-2\",\n",
       "      \"type\" : \"org.tribuo.math.optimisers.AdaGrad\",\n",
       "      \"export\" : \"false\",\n",
       "      \"import\" : \"false\",\n",
       "      \"properties\" : {\n",
       "        \"epsilon\" : \"0.01\",\n",
       "        \"initialLearningRate\" : \"0.5\",\n",
       "        \"initialValue\" : \"0.0\"\n",
       "      }\n",
       "    }, {\n",
       "      \"name\" : \"labelfactory-4\",\n",
       "      \"type\" : \"org.tribuo.classification.LabelFactory\",\n",
       "      \"export\" : \"false\",\n",
       "      \"import\" : \"false\"\n",
       "    }, {\n",
       "      \"name\" : \"logmulticlass-3\",\n",
       "      \"type\" : \"org.tribuo.classification.sgd.objectives.LogMulticlass\",\n",
       "      \"export\" : \"false\",\n",
       "      \"import\" : \"false\"\n",
       "    } ]\n",
       "  }\n",
       "}"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var outputFile = \"mnist-logistic-config.json\";\n",
    "var newCM = new ConfigurationManager();\n",
    "newCM.addConfiguration(provConfig);\n",
    "newCM.save(new File(outputFile),true);\n",
    "String.join(\"\\n\",Files.readAllLines(Paths.get(outputFile)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The five elements of the configuration are: the training data \"idxdatasource-1\", the logistic regression \"linearsgdtrainer-0\", the training log loss function \"logmulticlass-3\", the AdaGrad gradient optimizer \"adagrad-2\", and the label factory \"labelfactory-4\". The only unexpected part is the `LabelFactory` which is the factory that converts `String`s into `Label` instances.\n",
    "\n",
    "## Rebuilding a model from it's configuration\n",
    "\n",
    "Now to reconstruct our model, we can load in the Trainer and DataSource from the new `ConfigurationManager`, pass the source into a `Dataset`, and finally call train on the new trainer supplying the new dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "var newTrainer = (Trainer<Label>) newCM.lookup(\"linearsgdtrainer-0\");\n",
    "var newSource = (DataSource<Label>) newCM.lookup(\"idxdatasource-1\");\n",
    "var newDataset = new MutableDataset<>(newSource);\n",
    "var newModel = newTrainer.train(newDataset, Map.of(\"reconfigured-model\",new BooleanProvenance(\"reconfigured-model\",true)));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First we'll confirm that the old model and new models aren't equal (as they have different timestamps, among other provenance checks)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "false"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lrModel.getProvenance().equals(newModel.getProvenance())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we'll evaluate the first model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Class                           n          tp          fn          fp      recall        prec          f1\n",
      "0                             980         904          76          21       0.922       0.977       0.949\n",
      "1                           1,135       1,072          63          18       0.944       0.983       0.964\n",
      "2                           1,032         856         176          56       0.829       0.939       0.881\n",
      "3                           1,010         844         166          84       0.836       0.909       0.871\n",
      "4                             982         888          94          72       0.904       0.925       0.915\n",
      "5                             892         751         141         143       0.842       0.840       0.841\n",
      "6                             958         938          20         139       0.979       0.871       0.922\n",
      "7                           1,028         963          65         133       0.937       0.879       0.907\n",
      "8                             974         892          82         363       0.916       0.711       0.800\n",
      "9                           1,009         801         208          62       0.794       0.928       0.856\n",
      "Total                      10,000       8,909       1,091       1,091\n",
      "Accuracy                                                                    0.891\n",
      "Micro Average                                                               0.891       0.891       0.891\n",
      "Macro Average                                                               0.890       0.896       0.890\n",
      "Balanced Error Rate                                                         0.110\n",
      "               0       1       2       3       4       5       6       7       8       9\n",
      "0            904       0       2       3       1      20      26       4      18       2\n",
      "1              0   1,072       7       3       0       2       6       2      43       0\n",
      "2              3       6     856      26       5       7      39       8      80       2\n",
      "3              1       0      13     844       2      64       7      14      62       3\n",
      "4              0       0       7       2     888       1      22      15      20      27\n",
      "5              9       1       1      27       6     751      18       7      68       4\n",
      "6              3       1       2       1       1       9     938       1       2       0\n",
      "7              1       5      18       6       4       1       0     963       9      21\n",
      "8              1       3       6       9       9      25      20       6     892       3\n",
      "9              3       2       0       7      44      14       1      76      61     801\n",
      "\n"
     ]
    }
   ],
   "source": [
    "var lrEvaluator = evaluator.evaluate(lrModel,testData);\n",
    "System.out.println(lrEvaluator.toString());\n",
    "System.out.println(lrEvaluator.getConfusionMatrix().toString());"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's about what we'd expect for a linear model on MNIST. Not state-of-the-art (SOTA), but it'll do for now.\n",
    "\n",
    "Now let's check the new model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Class                           n          tp          fn          fp      recall        prec          f1\n",
      "0                             980         904          76          21       0.922       0.977       0.949\n",
      "1                           1,135       1,072          63          18       0.944       0.983       0.964\n",
      "2                           1,032         856         176          56       0.829       0.939       0.881\n",
      "3                           1,010         844         166          84       0.836       0.909       0.871\n",
      "4                             982         888          94          72       0.904       0.925       0.915\n",
      "5                             892         751         141         143       0.842       0.840       0.841\n",
      "6                             958         938          20         139       0.979       0.871       0.922\n",
      "7                           1,028         963          65         133       0.937       0.879       0.907\n",
      "8                             974         892          82         363       0.916       0.711       0.800\n",
      "9                           1,009         801         208          62       0.794       0.928       0.856\n",
      "Total                      10,000       8,909       1,091       1,091\n",
      "Accuracy                                                                    0.891\n",
      "Micro Average                                                               0.891       0.891       0.891\n",
      "Macro Average                                                               0.890       0.896       0.890\n",
      "Balanced Error Rate                                                         0.110\n",
      "               0       1       2       3       4       5       6       7       8       9\n",
      "0            904       0       2       3       1      20      26       4      18       2\n",
      "1              0   1,072       7       3       0       2       6       2      43       0\n",
      "2              3       6     856      26       5       7      39       8      80       2\n",
      "3              1       0      13     844       2      64       7      14      62       3\n",
      "4              0       0       7       2     888       1      22      15      20      27\n",
      "5              9       1       1      27       6     751      18       7      68       4\n",
      "6              3       1       2       1       1       9     938       1       2       0\n",
      "7              1       5      18       6       4       1       0     963       9      21\n",
      "8              1       3       6       9       9      25      20       6     892       3\n",
      "9              3       2       0       7      44      14       1      76      61     801\n",
      "\n"
     ]
    }
   ],
   "source": [
    "var newEvaluator = evaluator.evaluate(newModel,testData);\n",
    "System.out.println(newEvaluator.toString());\n",
    "System.out.println(newEvaluator.getConfusionMatrix().toString());"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see that both models perform identically. This is because our provenance system records the RNG seeds used at all points, and Tribuo is scrupulous about how and when it uses PRNGs. If you find a model reconstruction that gives a different answer (unless you're using XGBoost or TensorFlow, both of which have some non-determinism beyond our control) then file an issue on our GitHub as that's a bug.\n",
    "\n",
    "We provide a simple push-button replication facility in the `tribuo-reproducibility` project, see the tutorial on reproducibilty for more details.\n",
    "\n",
    "## What else lives in the Provenance?\n",
    "\n",
    "These evaluations have provenance in the same way the models do, and we can use a pretty printer in OLCUT to make it a little more human readable.\n",
    "\n",
    "In addition to the configuration information like the gradient optimiser and RNG seed, the provenance includes run specific information like the \"reconfigured-model\" flag we added, along with a hash of the data, timestamps for the various data files involved, and a timestamp for the model creation and dataset creation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "EvaluationProvenance(\n",
      "\tclass-name = org.tribuo.provenance.EvaluationProvenance\n",
      "\tmodel-provenance = LinearSGDModel(\n",
      "\t\t\tclass-name = org.tribuo.classification.sgd.linear.LinearSGDModel\n",
      "\t\t\tdataset = MutableDataset(\n",
      "\t\t\t\t\tclass-name = org.tribuo.MutableDataset\n",
      "\t\t\t\t\tdatasource = IDXDataSource(\n",
      "\t\t\t\t\t\t\tclass-name = org.tribuo.datasource.IDXDataSource\n",
      "\t\t\t\t\t\t\toutputPath = /local/ExternalRepositories/tribuo/tutorials/train-labels-idx1-ubyte.gz\n",
      "\t\t\t\t\t\t\toutputFactory = LabelFactory(\n",
      "\t\t\t\t\t\t\t\t\tclass-name = org.tribuo.classification.LabelFactory\n",
      "\t\t\t\t\t\t\t\t)\n",
      "\t\t\t\t\t\t\tfeaturesPath = /local/ExternalRepositories/tribuo/tutorials/train-images-idx3-ubyte.gz\n",
      "\t\t\t\t\t\t\tfeatures-file-modified-time = 2000-07-21T14:20:24-04:00\n",
      "\t\t\t\t\t\t\toutput-resource-hash = 3552534A0A558BBED6AED32B30C495CCA23D567EC52CAC8BE1A0730E8010255C\n",
      "\t\t\t\t\t\t\tdatasource-creation-time = 2022-10-07T11:33:57.506314-04:00\n",
      "\t\t\t\t\t\t\toutput-file-modified-time = 2000-07-21T14:20:27-04:00\n",
      "\t\t\t\t\t\t\tidx-feature-type = UBYTE\n",
      "\t\t\t\t\t\t\tfeatures-resource-hash = 440FCABF73CC546FA21475E81EA370265605F56BE210A4024D2CA8F203523609\n",
      "\t\t\t\t\t\t\thost-short-name = DataSource\n",
      "\t\t\t\t\t\t)\n",
      "\t\t\t\t\ttransformations = List[]\n",
      "\t\t\t\t\tis-sequence = false\n",
      "\t\t\t\t\tis-dense = false\n",
      "\t\t\t\t\tnum-examples = 60000\n",
      "\t\t\t\t\tnum-features = 717\n",
      "\t\t\t\t\tnum-outputs = 10\n",
      "\t\t\t\t\ttribuo-version = 4.3.0\n",
      "\t\t\t\t)\n",
      "\t\t\ttrainer = LinearSGDTrainer(\n",
      "\t\t\t\t\tclass-name = org.tribuo.classification.sgd.linear.LinearSGDTrainer\n",
      "\t\t\t\t\tseed = 1\n",
      "\t\t\t\t\tminibatchSize = 1\n",
      "\t\t\t\t\tshuffle = true\n",
      "\t\t\t\t\tepochs = 2\n",
      "\t\t\t\t\toptimiser = AdaGrad(\n",
      "\t\t\t\t\t\t\tclass-name = org.tribuo.math.optimisers.AdaGrad\n",
      "\t\t\t\t\t\t\tepsilon = 0.01\n",
      "\t\t\t\t\t\t\tinitialLearningRate = 0.5\n",
      "\t\t\t\t\t\t\tinitialValue = 0.0\n",
      "\t\t\t\t\t\t\thost-short-name = StochasticGradientOptimiser\n",
      "\t\t\t\t\t\t)\n",
      "\t\t\t\t\tloggingInterval = 10000\n",
      "\t\t\t\t\tobjective = LogMulticlass(\n",
      "\t\t\t\t\t\t\tclass-name = org.tribuo.classification.sgd.objectives.LogMulticlass\n",
      "\t\t\t\t\t\t\thost-short-name = LabelObjective\n",
      "\t\t\t\t\t\t)\n",
      "\t\t\t\t\ttribuo-version = 4.3.0\n",
      "\t\t\t\t\ttrain-invocation-count = 0\n",
      "\t\t\t\t\tis-sequence = false\n",
      "\t\t\t\t\thost-short-name = Trainer\n",
      "\t\t\t\t)\n",
      "\t\t\ttrained-at = 2022-10-07T11:34:03.181752-04:00\n",
      "\t\t\tinstance-values = Map{\n",
      "\t\t\t\treconfigured-model=true\n",
      "\t\t\t}\n",
      "\t\t\ttribuo-version = 4.3.0\n",
      "\t\t\tjava-version = 12\n",
      "\t\t\tos-name = Linux\n",
      "\t\t\tos-arch = amd64\n",
      "\t\t)\n",
      "\tdataset-provenance = MutableDataset(\n",
      "\t\t\tclass-name = org.tribuo.MutableDataset\n",
      "\t\t\tdatasource = IDXDataSource(\n",
      "\t\t\t\t\tclass-name = org.tribuo.datasource.IDXDataSource\n",
      "\t\t\t\t\toutputPath = /local/ExternalRepositories/tribuo/tutorials/t10k-labels-idx1-ubyte.gz\n",
      "\t\t\t\t\toutputFactory = LabelFactory(\n",
      "\t\t\t\t\t\t\tclass-name = org.tribuo.classification.LabelFactory\n",
      "\t\t\t\t\t\t)\n",
      "\t\t\t\t\tfeaturesPath = /local/ExternalRepositories/tribuo/tutorials/t10k-images-idx3-ubyte.gz\n",
      "\t\t\t\t\tfeatures-file-modified-time = 2000-07-21T14:19:56-04:00\n",
      "\t\t\t\t\toutput-resource-hash = F7AE60F92E00EC6DEBD23A6088C31DBD2371ECA3FFA0DEFAEFB259924204AEC6\n",
      "\t\t\t\t\tdatasource-creation-time = 2022-10-07T11:33:44.880399-04:00\n",
      "\t\t\t\t\toutput-file-modified-time = 2000-07-21T14:20:05-04:00\n",
      "\t\t\t\t\tidx-feature-type = UBYTE\n",
      "\t\t\t\t\tfeatures-resource-hash = 8D422C7B0A1C1C79245A5BCF07FE86E33EEAFEE792B84584AEC276F5A2DBC4E6\n",
      "\t\t\t\t\thost-short-name = DataSource\n",
      "\t\t\t\t)\n",
      "\t\t\ttransformations = List[]\n",
      "\t\t\tis-sequence = false\n",
      "\t\t\tis-dense = false\n",
      "\t\t\tnum-examples = 10000\n",
      "\t\t\tnum-features = 668\n",
      "\t\t\tnum-outputs = 10\n",
      "\t\t\ttribuo-version = 4.3.0\n",
      "\t\t)\n",
      "\ttribuo-version = 4.3.0\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "var evalProvenance = newEvaluator.getProvenance();\n",
    "System.out.println(ProvenanceUtil.formattedProvenanceString(evalProvenance));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Feature Transformations\n",
    "\n",
    "We can take the new trainer, wrap it programmatically in a TransfomTrainer which rescales the input features into the range `[0,1]`, and still generate provenance and configuration automatically as the model is trained."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training transformed logistic regression took (00:00:06:555)\n"
     ]
    }
   ],
   "source": [
    "var transformations = new TransformationMap(List.of(new LinearScalingTransformation(0,1)));\n",
    "var transformed = new TransformTrainer(newTrainer,transformations);\n",
    "var transformStart = System.currentTimeMillis();\n",
    "var transformedModel = transformed.train(newDataset);\n",
    "var transformEnd = System.currentTimeMillis();\n",
    "System.out.println(\"Training transformed logistic regression took \" + Util.formatDuration(transformStart,transformEnd));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we'll evaluate the rescaled model. Here we see that rescaling the data into the zero-one range improves the linear model performance a couple of percent as all the data is now on the same scale. As expected it's still not SOTA, but we're not using a huge CNN or some other complex model, for that you can try out our [TensorFlow interface](https://github.com/oracle/tribuo/blob/main/tutorials/tensorflow-tribuo-v4.ipynb), or use the XGBoost trainer we loaded in from the original configuration file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Class                           n          tp          fn          fp      recall        prec          f1\n",
      "0                             980         957          23          40       0.977       0.960       0.968\n",
      "1                           1,135       1,109          26          36       0.977       0.969       0.973\n",
      "2                           1,032         940          92          90       0.911       0.913       0.912\n",
      "3                           1,010         927          83         141       0.918       0.868       0.892\n",
      "4                             982         914          68          73       0.931       0.926       0.928\n",
      "5                             892         813          79         183       0.911       0.816       0.861\n",
      "6                             958         892          66          45       0.931       0.952       0.941\n",
      "7                           1,028         918         110          54       0.893       0.944       0.918\n",
      "8                             974         753         221          60       0.773       0.926       0.843\n",
      "9                           1,009         926          83         129       0.918       0.878       0.897\n",
      "Total                      10,000       9,149         851         851\n",
      "Accuracy                                                                    0.915\n",
      "Micro Average                                                               0.915       0.915       0.915\n",
      "Macro Average                                                               0.914       0.915       0.913\n",
      "Balanced Error Rate                                                         0.086\n",
      "               0       1       2       3       4       5       6       7       8       9\n",
      "0            957       0       1       2       1      12       4       2       1       0\n",
      "1              0   1,109      10       3       0       2       3       2       6       0\n",
      "2              4       9     940      18       9       7      11      11      19       4\n",
      "3              6       0      25     927       0      26       2       7       9       8\n",
      "4              1       1       7       4     914       0       9       7       4      35\n",
      "5              7       1       2      30       8     813       9       3      18       1\n",
      "6              8       2      14       3       8      27     892       2       2       0\n",
      "7              1       7      17      19       8       1       0     918       1      56\n",
      "8              7       9      13      46      11      93       7      10     753      25\n",
      "9              6       7       1      16      28      15       0      10       0     926\n",
      "\n"
     ]
    }
   ],
   "source": [
    "LabelEvaluation transformedEvaluator = evaluator.evaluate(transformedModel,testData);\n",
    "System.out.println(transformedEvaluator.toString());\n",
    "System.out.println(transformedEvaluator.getConfusionMatrix().toString());"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can emit a configuration which includes both the transformation trainer and the original trainer pulled from the old configuration. We'll write it out to a byte array rather than putting it on disk, but the process is the same."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{\n",
       "  \"config\" : {\n",
       "    \"components\" : [ {\n",
       "      \"name\" : \"linearscalingtransformation-4\",\n",
       "      \"type\" : \"org.tribuo.transform.transformations.LinearScalingTransformation\",\n",
       "      \"export\" : \"false\",\n",
       "      \"import\" : \"false\",\n",
       "      \"properties\" : {\n",
       "        \"targetMax\" : \"1.0\",\n",
       "        \"targetMin\" : \"0.0\"\n",
       "      }\n",
       "    }, {\n",
       "      \"name\" : \"labelfactory-7\",\n",
       "      \"type\" : \"org.tribuo.classification.LabelFactory\",\n",
       "      \"export\" : \"false\",\n",
       "      \"import\" : \"false\"\n",
       "    }, {\n",
       "      \"name\" : \"adagrad-5\",\n",
       "      \"type\" : \"org.tribuo.math.optimisers.AdaGrad\",\n",
       "      \"export\" : \"false\",\n",
       "      \"import\" : \"false\",\n",
       "      \"properties\" : {\n",
       "        \"epsilon\" : \"0.01\",\n",
       "        \"initialLearningRate\" : \"0.5\",\n",
       "        \"initialValue\" : \"0.0\"\n",
       "      }\n",
       "    }, {\n",
       "      \"name\" : \"linearsgdtrainer-2\",\n",
       "      \"type\" : \"org.tribuo.classification.sgd.linear.LinearSGDTrainer\",\n",
       "      \"export\" : \"false\",\n",
       "      \"import\" : \"false\",\n",
       "      \"properties\" : {\n",
       "        \"seed\" : \"1\",\n",
       "        \"minibatchSize\" : \"1\",\n",
       "        \"shuffle\" : \"true\",\n",
       "        \"epochs\" : \"2\",\n",
       "        \"optimiser\" : \"adagrad-5\",\n",
       "        \"loggingInterval\" : \"10000\",\n",
       "        \"objective\" : \"logmulticlass-6\"\n",
       "      }\n",
       "    }, {\n",
       "      \"name\" : \"transformtrainer-0\",\n",
       "      \"type\" : \"org.tribuo.transform.TransformTrainer\",\n",
       "      \"export\" : \"false\",\n",
       "      \"import\" : \"false\",\n",
       "      \"properties\" : {\n",
       "        \"includeImplicitZeroFeatures\" : \"false\",\n",
       "        \"transformations\" : \"transformationmap-1\",\n",
       "        \"densify\" : \"false\",\n",
       "        \"innerTrainer\" : \"linearsgdtrainer-2\"\n",
       "      }\n",
       "    }, {\n",
       "      \"name\" : \"logmulticlass-6\",\n",
       "      \"type\" : \"org.tribuo.classification.sgd.objectives.LogMulticlass\",\n",
       "      \"export\" : \"false\",\n",
       "      \"import\" : \"false\"\n",
       "    }, {\n",
       "      \"name\" : \"idxdatasource-3\",\n",
       "      \"type\" : \"org.tribuo.datasource.IDXDataSource\",\n",
       "      \"export\" : \"false\",\n",
       "      \"import\" : \"false\",\n",
       "      \"properties\" : {\n",
       "        \"outputPath\" : \"/local/ExternalRepositories/tribuo/tutorials/train-labels-idx1-ubyte.gz\",\n",
       "        \"outputFactory\" : \"labelfactory-7\",\n",
       "        \"featuresPath\" : \"/local/ExternalRepositories/tribuo/tutorials/train-images-idx3-ubyte.gz\"\n",
       "      }\n",
       "    }, {\n",
       "      \"name\" : \"transformationmap-1\",\n",
       "      \"type\" : \"org.tribuo.transform.TransformationMap\",\n",
       "      \"export\" : \"false\",\n",
       "      \"import\" : \"false\",\n",
       "      \"properties\" : {\n",
       "        \"featureTransformationList\" : { },\n",
       "        \"globalTransformations\" : [ {\n",
       "          \"item\" : \"linearscalingtransformation-4\"\n",
       "        } ]\n",
       "      }\n",
       "    } ]\n",
       "  }\n",
       "}"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var transformedProvConfig = ProvenanceUtil.extractConfiguration(transformedModel.getProvenance());\n",
    "var baos = new ByteArrayOutputStream();\n",
    "newCM = new ConfigurationManager();\n",
    "newCM.addConfiguration(transformedProvConfig);\n",
    "newCM.save(baos,\"json\",true);\n",
    "baos.toString();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Aside from the names (which have different tag numbers) we can see that this configuration is identical to the previous one, but with the addition of the `transformtrainer-0` and it's dependents."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "We've taken a closer look at Tribuo's configuration and provenance systems, showing how to train a model using a configuration file, how to inspect the model's provenance, extract it's configuration, and finally how to combine that extracted configuration with other programmatic elements of the Tribuo library (in this case the feature transformation system). We saw that the provenance combines both the configuration of the trainer and the datasource, along with runtime information extracted from the dataset itself (e.g., timestamps and file hashes). Tribuo's provenance objects are also persisted in ONNX model files exported from Tribuo, and these provenances can be recovered later using Tribuo's `ONNXExternalModel` class which provides ONNX model inference. For more details on ONNX export see the ONNX export and deployment tutorial.\n",
    "\n",
    "Tribuo's configuration system is integrated into a CLI options/arguments parsing system, which can be used to override elements from the configuration file. The values from the options are then stored in the `ConfigurationManager` and appear in the provenance and downstream configuration objects as expected. Tribuo also provides a redaction system for configuration files (e.g., to ensure a password isn't stored in the provenance) and for provenance objects themselves (e.g., to remove the data provenance from a trained model), which aids model deployment to untrusted or less trusted systems."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Java",
   "language": "java",
   "name": "java"
  },
  "language_info": {
   "codemirror_mode": "java",
   "file_extension": ".jshell",
   "mimetype": "text/x-java-source",
   "name": "Java",
   "pygments_lexer": "java",
   "version": "12+33"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
